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Abstract 

A new statistical technique for constructing linear latent structure (LLS) models 
from available data, supported by well established theoretical results and an efficient 
algorithm, is presented. The method reduces the problem of estimating LLS model 
parameters to a sequence of linear algebra problems. This assures a low computa- 
tional complexity and an ability to handle large scale data that involve thousands of 
variables. An overall computational scheme and all its components are discussed in 
detail. Simulation experiments demonstrate the excellent performance of the algo- 
rithm in reconstructing model parameters. Step-by-step analysis of a demographic 
survey is presented as an example. 

The technique is useful for the analysis of high-dimensional categorical data (e.g., 
demographic surveys, gene expression data) where the detection, evaluation, and 
interpretation of a underlying latent structure are required. 

Key words: Latent structure analysis, mixing distributions, high-dimensional 
categorical data, simulation experiments, methods of linear algebra 



1 Introduction 



Categorical data often occur in beliavior sciences, epidemiology, genetics, ed- 
ucation, and marketing. A common feature of modern categorical data collec- 
tion (e.g., demographic surveys or gene expression data) is their high dimen- 
sionality with respect to both to sample size and the numbers of categorical 
variables measured for each individual. 
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Limited numbers of categorical variables may not precisely characterize an 
object. This can be due to the stochastic nature of observed objects, their 
heterogeneity, or measurement error. Often the phenomenon of interest can- 
not be observed directly, so a series of measures has to be made, each related 
to latent variables of interest. As a consequence, the set of categorically scored 
measTirements is strongly related. Researchers analyzing such data have to: i) 
extract the signal from data with a large stochastic component, ii) extract a 
latent (unobservable) structure, related to the categorical data, and iii) pa- 
rameterize the latent structure to interpret it substantively. The identification 
and description of a latent structure are not sufficient when estimation of in- 
dividual parameters are required. Such problems are informally solved by a 
physician making a diagnosis: he/she estimates an unobservable variable of 
interest (presence of a disease) based on categorical measures (description of 
symptoms and answers to physician's questions) and estimated population 
characteristics (physician's professional experience). Likewise, having a set of 
categorical measurements (e.g., a demographic survey) made on individuals, a 
researcher would like to discover i) properties of the population and ii) proper- 
ties of the individuals. Methods which attempt to achieve these simultaneously 
are known as latent structure analysis. 

One of the best known such methods is the latent class model (LCM), which 
can be characterized as a statistical method for finding subtypes of related 
cases (latent classes) from multivariate categorical data (Lazarsfeld and Henry, 
1968; Goodman, 1974; Clogg, 1995). In LCM, individuals are assigned to one 
of several homogeneous classes. This requires estimation of the individual la- 
tent variable (class number). Other specific models of this family such as item 
response theory and Rasch models, discrete latent class models (Heinen, 1996), 
latent distribution analysis (Mislevy. 1984; Uebersax & Grove, 1993; Qu, Tan 
& Kutner, 1996), differ by the assumptions made about the latent variable(s). 
One method for identifying the latent structure in large categorical data sets 
with simultaneous evahiation of individual scores in a state space is Grade 
of Membership (GoM) analysis. GoM was introduced in Woodbury and Clive 
(1974); see Manton et al. (1994) for a detailed exposition and additional ref- 
erences. Statistical properties of GoM models were analyzed by ToUey and 
Manton (1992). 

All these models belong to the family of latent structure analysis that considers 
a number of categorical variables measured on each individual in a sample with 
the intent of discovering the properties of both a population and individuals 
composing the population. Different approaches to this general problem, and 
recent developments, are described in reports collected in Hagenaars and Mc- 
Cutcheon (2002). The relation of specific latent structure models is discussed 
by Uebersax (1997) and Erosheva (2005). Comparative analyses of methods of 
LSA and traditional statistical approaches (Agresti 2002) attempting to find 
the correspondence of population parameters of interest and sample statistics 
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are presented by Powers and Xie (2000). 

Methods of latent structure analysis can be reformulated in terms of statistical 
mixtures and mixing distributions. Bartholomew (2002) noted the theory of 
mixing distributions as a common mathematical framework of latent structure 
models with abilities "to clarify, simplify, and unify the disparate developments 
spanning over a century." The main assumption of the latent structure anal- 
ysis is the local independence assumption. Being formulated in terms of the 
theory of mixing distributions, it states that the observed joint distribution (of 
categorical random variables) can be described as a mixture of independent 
distributions. The mixing distribution is considered as a distribution of the 
latent variable(s) that contains latent information regarding the phenomenon 
under study. Specific models of latent structure analysis vary by assumptions 
regarding properties of the mixing distribution. 

LCM, GoM, and many other methods of latent structure analysis use max- 
imum likelihood for parameter estimation. Although a part of parameters 
describing a latent structure are regular (structural) parameters, parameters 
corresponding to individual states are so-called "nuisance" parameters, the 
number of which may increase with sample size. There are a series of non- 
trivial mathematical tasks which have to be solved to estimate such individual 
parameters. First is the problem of identifiability, which is especially difficult 
for large samples and a large number of estimated parameters. Second is the 
problem of consistency, which is very difficult to prove when structural and 
nuisance parameters are estimated simultaneously. Third is the availability 
and quality of algorithms to perform estimations. Often additional assump- 
tions not required by a model are made to facilitate likelihood maximum. 
No general theorems arc available to address these questions, so all of these 
points have to be considered separately for each task involved in dealing with 
nuisance parameters (e.g., Haberman, 1995; Erosheva, 2002). 

Recently Linear Latent Structure (LLS) analysis has been proposed to model 
high dimensional categorical data (Kovtun et al., 2005a,b,c). LLS has been 
formulated using mixing distribution theory. Similar to other latent structure 
analyses, the goal of LLS analysis is to derive both the properties of a popula- 
tion and individuals using discrete measurements. LLS, however, does not use 
a maximum likelihood approach for parameter estimation. Instead, it uses a 
new estimator, where LLS parameter estimates are solutions of a quasilincar 
system of equations. For this estimator, it is possible to prove consistency, to 
formulate conditions for identifiability, and to formulate a high-performance 
algorithm allowing one to handle datasets involving thousands of categorical 
variables. 

The LLS approach is briefiy reviewed in Section 2. Specifically, the structure of 
the data base, the LLS problem formulations, their basic properties, and two 
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examples are discussed. A detailed description of the LLS algorithm and its 
implementation is in Section 3. Attention is paid not only to the components 
of LLS analysis, but also to auxiliary algorithms, and to procedures for choos- 
ing a basis which may require prior information about the study phenomenon. 
Section 4 is devoted to apphcations, and has two parts. The first contains a 
description of simulation experiments designed to check the quality of recon- 
struction of model components. The second discusses how the model is applied 
to a specific dataset. Section 5 summarizes results and includes discussion and 
conclusions. 



2 Linear Latent Structure Analysis 



2. 1 Structure of datasets and population characteristics. 



The typical dataset analyzed by methods of latent structure analysis can be 
represented by the I x J matrix constituted by categorical outcomes Xj of 
J measurements on / individuals, where i = and j = 1, . . . , J run 

over individuals and measurements, respectively. Each row in the matrix cor- 
responds to an individual and contains an individual response pattern, i.e., a 
sequence of J numbers with the jth number running from 1 to the number 
of responses Lj for that variable. In most cases Lj ranges from 2 to 5-10, and 
rarely exceeds several dozens. Thus, the results of a survey are represented 
by / measurements of random variables Xi, . . . , Xj, with the set of outcomes 
of the jth measurement being {l,...,Lj}. The joint distribution of random 
variables Xi, . . . , Xj can be described by the elementary probabilities, 

pe^P {Xi = £i and • • • and Xj = £j) , (1) 

where i — {ii, ij) is an individual response pattern and £j e {1, ...,Lj}. To 
include into consideration marginal probabilities, we allow some components 
of i to be O's. For example, for three binary variables, 

P(2,o,i) = P{Xi = 2 and X^ = 1) = P(2,i,i) +P(2,2,i) 

Values of these probabilities pe (and only these) are directly estimable from 
the observations. If li is the number of individuals with pattern i, consistent 
estimates for pi are given by frequencies fi — Ij^/I. 
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2.2 LLS task: statistical, geometrical, and mixing distribution points of view. 

The problem in LLS analysis is to evaluate dimension of a hidden space, iden- 
tify its location in the space of larger dimension, and to evaluate hidden in- 
dividuals' characteristics (coordinates in the latent sub-space) from the data. 
The LLS analysis is based on two assumptions. The first is the assumption 
about "local independence" , which is common for all methods of latent struc- 
ture analysis. The second is specific for LLS analysis. It is about existence 
of low-dimensional linear subspace associated with the latent structure. We 
present LLS in terms of the theory of mixing distributions, and then discuss 
its specific assumption from statistical and geometrical points of view. 

Population characteristics are completely described by the joint distribution 
of random variables Xi, . . . , Xj presented by probabilities (1). Among all pos- 
sible joint distributions, one can distinguish independent distributions, i.e. 
distributions satisfying, 

Pi^P {Xi = ^1 and • • • and Xj = ij) = JJ. P {Xj ^ Q. (2) 

The description of an independent distribution law requires only knowing 
P {Xj = ij) denoted below as Pji. Vectors of probabilities (3 = {(3u, • • • , PjLj) 
belong to vector space i?!-^', where \L\ = J2j Lj. Indexes of the vector compo- 
nents run over all possible pairs of jl, i.e., corresponding to probabilities of the 
first outcome to the first question, of the second outcome to the first question, 
and so on. Requirements for (3ji to be probabilities restricts their domain in 
the vector space by 

^f3ji = l and (3ji>0. (3) 
1=1 

This domain represents the direct product of J unit simplex of dimensions Lj. 

Since variables Xi, . . . ,Xj in general case are not independent, the observed 
distribution {p^} cannot be described by the product of independent distribu- 
tions, but it can be exactly described as a mixture of independent distributions. 
This means that each set of independent probabilities contributes to observed 
distribution with a weight function. This weight function is called mixing dis- 
tribution. It is defined in the space of independent distributions, i.e. for each 
vector of probabilities f3 satisfying (3). Let F{f3) be the cumulative distribu- 
tion function of the mixing distribution. Probabilities pi are represented as, 

pe^j dF{f3) n'^^ (4) 

Thus, latent structure analysis searches for a representation of the observed 
distribution as a mixture of independent distributions. 
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Any distribution {pi} can be represented as a mixture, so representation (4) 
does not restrict the family of distributions and further specifications are re- 
quired. They are formulated as restrictions on the support of mixing distri- 
bution or, equivalently, on a set of mixed independent distributions. The LLS 
specific assumption is that this set is restricted to be a X-dimensional linear 
subspace of the space of independent distributions, i.e., the mixing distribution 
is supported by the linear subspace spanned by K basis vectors A^, . . . , . 
Below this LLS assumption is considered from the point of view of pure sta- 
tistical analysis and the geometry of the task. 

Individual characteristics are described by individual probabilities = ^i-^j ~ 
I) of specific outcomes (i = 1, . . . , / runs over individuals). 

The LLS assumption about the existence of a low-dimensional linear subspace 
supporting the mixing distribution is essentially equivalent to the assumption 
that there exists a i^- dimensional random vector G such that for every j a 
regression of Yji (random variable Yji equaling 1 if X,- = / and otherwise) on 
G is linear. The regression equation relates the expectation of Yji, which is Pji, 
to the random vector G. If a specific value of G is associated with individual 
i (so-called LLS scores Qik), then the regression takes the form, 

P}i = J: 9.k>^i. (5) 

fe=l 

The sense of the regression coefficients A*^; and model restrictions is clarified 
by analyzing the geometry of the LLS task. 

Vectors of individual probabihties = of individual responses — 

{Yji} and the regression coefficients A'^ = {A^J lie in the permitted domain (3) 
of the space of independent distributions. From a geometric point of view, LLS 
searches a fT-dimensional subspace (represented by A^J in this space, which is 
the "closest" to the set of / points representing individual outcomes Y-i- This 
hnear subspace is defined by its basis A^, . . . , A^, so to find the X-dimensional 
subspace means finding a basis, Aj^^, {k = 1,. . .,K). Every basis A-*^, . . . , A''^ 
defines a family of regression coefficients and vice versa. The complete set of 
restrictions in the LLS task allowing to consider and A^; as probabilities, 
is, ^ 

^Aj, = l, Aj,>0, Y.9ik^l and Y.9ik\%>Q. (6) 

1=1 k=l k 

LLS scores characterizing an individual i are then estimated as the ex- 
pectation of vector G, conditional on the respondent's answers. Basis vectors 

of the subspace can be interpreted as probabilities and can define the "pure 
types" (Manton et al., 1994). In this sense, the model decomposition (5) has 
the interpretation of a decomposition over pure types or over "ideal persons" 
whose individual probabilities are basis vectors of the subspace. 
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Summarizing, one can say that the LLS model approximates the observed 
distribution of Xi, . . . ,Xj by a mixture of independent distributions with a 
mixing distribution supported by a i^- dimensional subspace of the space of 
independent distribution. To specify such a model distribution it is sufficient 
to define the following LLS parameters: 

(1) A basis A^, . . . , of the space that supports the mixing distribution. 

(2) Conditional moments E(G|X = i). 

This set of model parameters is not the only set possible. We chose it because 
of a number of useful properties listed below. 

Property 1. The mixing distribution can be estimated in the style of an 
empirical distribution, i.e., when the estimator is a distribution concentrated 
in points E{G\X — £) with weights fe. 

Property 2. The conditional expectations E{G\X = £) provide knowledge 
about individuals. These conditional expectations can be considered as co- 
ordinates in a phase space, to which all individuals belong. The ability to 
discover the phase space and determine individual positions in it is a valuable 
feature of LLS analysis. 

Property 3. When the number of measurement, J, tends to infinity, the 
individual conditional expectations gi = E{G\X = i^^''), where i^^^ is the vector 
of responses of individual i, converge to the true value of the latent variable 
for this individual, and estimates of the mixing distribution converge to the 
true one, provided some regularity conditions (Kovtun et al., 2005d). 

2.3 Moment matrix and the main system of equations. 

Parameter estimation is based on two facts (Kovtun et al., 2005a,b) formu- 
lated in terms of the conditional and unconditional moments of the mixing 

distribution. The first is that columns of moment matrix belong exactly to 
the desired linear space. The second is that they obey the main system of 
equations. 

2.3.1 Unconditional moments and the moment matrix 

The first set of values in which we are interested consists of the unconditional 
moments of the mixing distribution F{(5), 

= J dF((3)Y[^,^^^^(3je,Pe (7) 
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Note an important fact regarding the above equation. The value on the left- 
hand-side, M^, is a moment of mixing distribution, while the value on the right- 
hand-side, Pi, comes from the joint distribution of Xi, . . . , Xj; the equality of 
these values is a direct corollary of the definition of mixture. The existence of 
their connection between two distinct distributions is crucial for LLS analysis. 

The first corollary of eq. (7) is that the unconditional moments are directly 
estimable from data and, therefore, the frequencies fi of response patterns £ 
observed in a sample are consistent and efficient estimators for conditional 
moments M^. 

Recall that we allow some components of response pattern i to be 0. In this 
case pi axe marginal probabilities. In the definition of the multipliers, 
corresponding to components of i, are excluded from the product. Thus, the 
order of moment is equal to the number of non-zero components in £. 

All moments defined in (7) are estimable by frequencies; however, this defini- 
tion docs not cover all moments of a certain order. For example, moments of 
second order with and f3ji2, (i.e., with the same j) are not estimable. This 
arises because the data do not include double answers to the same question. 
One can notice that i) all moments of first order are estimable, ii) the pro- 
portion of estimable moments decreases with the increase of order, and iii) no 
moments of order J + 1 and higher are estimable. 

The moment matrix is constructed from moments of order up to J using the 
following formal rules: 

(1) Rows of the moment matrix are indexed by response patterns contain- 
ing exactly one non-zero component or, equivalently, by pair indexes jl. 
Thus, the moment matrix contains \L\ rows, and their columns can be 
considered as vectors in i?I^L 

(2) Columns of the moment matrix are indexed by all possible response pat- 
terns, including a response pattern containing all O's. The first column is 
indexed by response pattern (0, . . . , 0); the next \L\ columns are indexed 
by response patterns containing one non-zero component, and so on. 

(3) The element on the intersection of row i' and column i" is M^'+^", if i" 
has at the position of the only non-zero component of i' (in this case, 
£' + £" is a meaningful response pattern; otherwise, the question mark 
is placed on the position of intersection of row i' and column i"). For 
example, the element of the moment matrix in row (1, 0, 0) and column 
(0,2,2) is Mi^2,2, and element in row (1,0,0) and column (1,0,2) is a 
question mark. 

Equation (8) gives an example of a portion of the moment matrix for the case 



8 



of J = 3 dichotomous variables, i.e., Li — L2 — L3 — 2. 
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In this example, places for inestimable moments are filled by question marks. 
The first column of the moment matrix contains moments of the first order, 
when only one specific outcome of one specific question is taken into account. 
There are no inestimable moments in the first column. Elements of this column 
can be denoted as components of vectors in i?'^', i.e., as Mji. The next six (|L| 
in general) columns correspond to second-order moments. Blocks of diagonal 
elements are not estimable. Second-order moments can be also denoted via 
pair jl of indexes as Mji-jur. The last shown column corresponds to third order 
moments. The notation Mji and Mji-jqi is used below for specific columns of 
the moment matrix. 

The part of the moment matrix consisting of second-order moments (which is 
\L\ X \L\ square matrix) together with the column of first-order moments is of 
special interest. A well-know fact is that if a distribution in an n-dimensional 
Euclidean space is carried by a /c-dimensional linear manifold, then the rank 
of the covariance matrix is equal to k, and the position of the manifold can 
be derived from the covariance matrix. This fact is the cornerstone of prin- 
cipal component analysis. Our method is based on similar ideas, adapted to 
having an incomplete set of second-order moments. For a small J (as in the 
example), there is a relatively large fraction of non-estimable components in 
the second-order part of the moment matrix. For increasing J, this fraction 
rapidly decreases. 

For a moment matrix M let its completion M be a matrix obtained from M by 
replacing question marks with arbitrary numbers. The main fact with respect 
to the moment matrix is that the moment matrix always has a completion in 
which all columns belong to the supporting plane A. Thus, if the estimable part 
of the moment matrix has sufficient rank (which is the case in nondegenerate 
situations,) a basis in A may be obtained from this matrix. As we have a 
consistent estimator of the moment matrix in the form of a frequency matrix, 
the supporting plane may be consistently estimated. 

The mixing distribution in the LLS model is supported by the intersection of 
the linear subspace with the polyhedron defined by (3). This intersection Pp 
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can be parameterized by g^s in accordance with (5). In Appendix, we demon- 
strate that integrals in (7) over can be reduced to integrals in the linear 
subspace over the polyhedron Pg, and how distribution functions P(/3) and 
F{g) relate to each other. Then unconditional moments can be formally 
represented as 

Me^Pe^ldF(g)U^,^^^^j:,9kX% (9) 

Representation (9) helps to illustrate the main properties of the moment ma- 
trix: i) the rank of the matrix is K and ii) all columns of the matrix belong 
to a linear subspace with a basis represented by vectors A'^. Indeed, since jl 
indexes running in each column of the moment matrix are only involved in 
A's, which are independent of integration variable g, and, therefore, which can 
be removed from the integral. The remaining integration over g is common 
for all elements of the moment matrix of a certain order. The first column and 
second order matrix are represented. 



Mji = / dF{g) E g,X^, = ^ / dF{g)gA ■ \% = ^ 
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k \ k' ) k 



(10) 



where and J^j/;/ are some numeric coefficients. From (10) we see that each 
column of moment matrix can be represented as a linear combination of K 
vectors A'^. 



2.3.2 Unconditional moments and main system of equations 

Another set of the values of interest are the conditional moments E{Gk\X = i), 
which express knowledge of the state of individuals based on measurements. 
They are not directly estimable from observations. The goal of LLS analysis 
is to obtain estimates for these conditional moments. Explicit expressions for 
those of the lowest order are obtained using the Bayes theorem (Kovtun et al. 
2005a), 

E(G.|X = £) = / dF{g)gJ^^^^^^^^ (11) 

Analogously, higher conditional moments, including products of components 
of G, can be constructed. As can be seen exphcitly from (34) and (11), the 
relation of conditional and unconditional moments in LLS analysis can be 
described as, 

E,AJ,.E(G',|X = £)^, (12) 
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where vector (, contains in position j, and ^ + Ij contains / in this position. 
Similar equations can be written for conditional moments of higher orders. 
We refer to the system of equations relating conditional and unconditional 
moments as the main system of the equation. Kovtun et al, (2005a) proved 
the following properties of solutions of the main system of equations: i) any 
basis of A together with conditional moments E{Gk\X = £) calculated on 
this basis give a solution of the main system of equation; and ii) under regular 
conditions, every solution of the main system of equations gives a basis of A 
and conditional moments calculated in this basis. Note, that equation (12) is 
linear with respect to conditional moments. 

The described properties of the moment matrix and solutions of the main 
system of equations suggest an efficient algorithm to obtain LLS estimates. 
First, a basis of the supporting plane can be obtained from the moment matrix, 
and second, conditional moments can be found by solving a linear system of 
equations. An important property of conditional moments is that (Kovtun et 
al., 2005d): if a population is represented by a set of true LLS scores, then 
unconditional moments give good approximation for them for finite J, and 
converge to true LLS scores when J — > oo. 



2.4 Two illustrative examples. 



Before going into detail for the algorithm and to realistic tasks of data anal- 
ysis, we consider two simple illustrative examples. For both of them, assume 
K = 2, three dichotomous variables (J = 3), and the basis vectors are 
= (1,0; 1,0; 1,0) and = (1/2, V2; 0, 1; 0, 1). Then the independent dis- 
tributions being mixed are defined by vectors: 



Thus, a mixing distribution can be given one dimensional p.d.f. p{gi). For the 
first task, we assume that the mixing distribution is uniform {p{gi) = 1 ■d{gi) ■ 
d{l — gi). In the second case we assume the mixing distribution is concentrated 
at two points with g^ = 0.1 and gi = 0.4 (^(5-1) = V2[5(5'i - Vio) + ^(s'l - Vs)])- 
Unconditional moments are calculated using (7). Moment matrices for both 




< ^1 < 1. 



(13) 
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Since these matrices were constructed from mixing distributions known a 
priori, diagonal blocks in the sub-matrix of the second order are calculable 
(marked by the italic font in (14)). As one can see, the rank of both these 
matrices is 2. Conditional moments are calculated for an outcome pattern. 
Choose I = (001) and £ + /i = (101). Using (11) we have, 

£{Gi\X = (001)) = 2/3 and £{G2\X = (001)) = 1/3 (15) 

for the first example and, 

E(Gi|X = (001)) = 17/50 and E(G2|X = (001)) = 33/50 (16) 

for the second. Using corresponding elements of Mi in (14) (marked by bold 
text) we can see that l.h.s. and r.h.s of eq. (12) equal to 5/6 for first example 
and 67/100 for the second: 



3 2 3 



5/12 
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and 
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External indexes in this example are j — 1 and I — 1. 



3 Algorithms of Lineair Latent Structure Analysis 



Parameter estimations in LLS models are based on properties of the moment 
matrix and the main system of equations. These properties allow us to reduce 
a problem of estimating model parameters to a sequence of linear algebra 
problems. The algorithm based on linear algebra methods assures a low com- 
putational complexity. 

Data to be analyzed are represented by a set of measurements Xj (See section 
2.1). Finding a linear space and individual LLS scores is required. Estima- 
tion of the model includes four steps: i) estimating the rank of the frequency 
matrix, ii) finding the supporting plane, iii) choosing a basis in the found 
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plane, iv) calculating individual conditional expectations and estimating mix- 
ing distribution. The second and fourth steps are the essence of LLS parameter 
estimation problem. The first step is defined as separate because sometimes 
the desired dimensionality of the LLS model may be provided by a researcher, 
and this step may be skipped. The third step requires using prior information 
about the processes studied, so it is also examined separately. 



3.1 Moment matrix calculation 



An important preceding step that deserves special attention is the moment ma- 
trix calculation. The elements of the moment matrix given by are approx- 
imated by observable frequencies defined as = h/ 1) where is the number 
of individuals with outcome pattern and / is the total number of individuals 
having certain (not missing) outcomes for nonzero elements in I. Columns of 
a different order have different normalizations, e.g., the sum of first-order mo- 
ments corresponding to question j is one (e.g., M(oio) + -M(o2o) = 1); while sums 
of columns for this j of the second-order sub-matrix are equal to correspond- 
ing first-order moment (e.g., M(iio) -|- M(i2o) = -M(ioo)). General conditions of 
summations of the second order moments written in terms of notation defined 
after eq. (8) are, 

Y: M,,,n' = M,i. (18) 
i'=i 



Because of missing data, the property of normalization can be violated. This 
property, with or without the renormalization making the sums equaling to 
one, is required for the analysis. The renormalization could provide the prop- 
erty in the case of presence of missing data, however, this approximation can 
be true only assuming missing data are random. More detailed discussion of 
procedures to deal with missing data are discussed in Section 3.9. 

In addition, a matrix containing standard errors (or confidence intervals) of 
estimates of frequencies is calculated for each element of the frequency matrix. 
Standard errors for binomial distribution, i.e. ae = \J fi{l — ft)! hi require 
generalization for patterns with small as discussed in Brown et al. (2001). 
A generalization based on Wilson's approach (Brown et al., 2001) uses. 



w 



2 

2 ''a/2 



± 



a/2 



h + z. 



a/2 



\ 



(19) 



Eq. (19) recovers the standard Wald's estimates of CIs, i.e., Clg = Pc^Za/2<^E'i 
Za/2 = $^^(1 — q;/2), where $(a;) is the standard normal distribution function 



and a denotes the confidence level. 
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3.2 Computational rank of the frequency matrix 



The frequency matrix can be presented as a sum of the moment matrix with 
rank K and a matrix with a stochastic component. To define the dimensional- 
ity of the LLS problem, we have to estimate the rank of the frequency matrix 
eliminating the stochastic component. Specifically, wc take the greatest minor 
of the frequency matrix that does not contain question marks. Then we calcu- 
late the singular value decomposition (SVD) and take K equal to the number 
of singular values that are greater than a maximum of the total standard devi- 
ation estimated as the quadratic sum of standard errors of frequencies involved 
in the minor. 

The choice of a minor does not essentially infiuencc the computational rank 
of the frequency matrix. Indeed, the geometrically specific choice of a minor 
(e.g. a n-dimcnsional minor of maximal size in left low corner of moment 
matrix) corresponds to projection of a part of vectors onto n-dimensional 
linear subspace. If the real rank of the moment matrix is much less than n, it 
is clear that the rank of the projections does not change. 



3.3 Finding the supporting plane 



All columns of the moment matrix belong to the supporting plane, and as 
the frequency matrix is an approximation of the moment matrix, a natural 
way to search for the supporting plane is to search for a plane that mini- 
mizes the sum of distances from it to the columns of the frequency matrix. 
In our case, however, this way is complicated by: (a) the frequency matrix 
is incomplete; (b) the statistical inaccuracy of approximation of moments Mi 
by frequencies varies considerably over elements of frequency matrix; and 
(c) a sought basis should exactly satisfy conditions Y^di ^ji — 1 for every k 
and j. The current prototype of the code overcomes these obstacles by using 
some heuristic methods: (a) An iterative procedure for completion of the fre- 
quency matrix is used: after a basis of supporting plane is obtained, it is used 
to recalculate completion of the frequency matrix. A new frequency matrix is 
used for adjusting basis calculation etc. (b) Only the first and second order 
moments are examined, so statistical errors of different columns in this matrix 
are compatible, (c) Rotation of each simplex (corresponding to each question) 
to the hyperplane to eliminate one degree of freedom. Rotation, but not a 
simple projection, is required to provide the same distances between points in 
a simplex. Items (a) and (c) require explicit consideration. 
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3.3.1 Completion of the moment matrix 



We consider the second-order moment matrix where for every j there are unde- 
fined elements corresponding to repeated answers to the same question. The 
intent of completion procedure is to approximate these elements, assuming 
that the supporting subspace A is found. Since only the completed frequency 
matrix is used for finding subspace A, and since the completion procedure 
uses a basis in the sought subspace A, it can be done within the iteration 
procedure. For one iteration step, it is required to find a symmetric matrix 
Bj of L-j X Lj-dimension with positive elements and the required summation 
conditions such that the sum of elements in a column (or in a row) equals 
to the corresponding moment of the first order, i.e., J2iBj iii = Mji,. Since we 
know first- and second-order frequencies {fji and fjij'i'] j 7^ j'), which only 
approximate exact moments {Mji and Mjijui), special efforts are required to 
process the properties of Bj. Columns of the second-order sub- matrix corre- 
sponding to question j are presented using known frequencies fjijf, j 7^ j and 
inestimable elements Bj ii, 



f: 



B 



\fjLj;jl ■ ■ ■ JjLj;jLj j 



f 



flLv,jl ■ ■ ■ flLn 



B 



flljl ■ ■ ■ fjl;jL~. 



fj 



(20) 



The completion procedure is based on the fact that the rank of the moment 
matrix is K, which is much smaller than the dimension of matrix \L\. There- 
fore, only K columns are linearly independent. Each column of the moment 
matrix, being a vector in /T-dimensional vector space, can be expanded over 
basis vectors A^,...,A^ available after finding the subspace A. Known el- 
ements fji-ji (J = l,...,Lj and j 7^ j) of columns of the moment matrix 
corresponding to question j are expanded, 

te^E^^'Aj^ ij^j)- (21) 

k 
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If coefficients are found, matrix B-j can be constructed as B-j — J^Cl A|p, 

k ■' 

The number of known components of a vector f^i.^j is greater than the number 

of basis vectors, so coefficients C^^ can be calculated by ordinary least squares 
methods minimizing residuals. Matrix Bj has to be obtained exactly symmet- 
ric, using minimization with restrictions. Lagrangian Lg-j are written for each 

~h 

I \k / l^l' k ^ ■' ■' ' 

where pi and p/j/ are Lagrange multipliers for equality conditions, and pi^ — for 
inequality conditions. The optimization task to find with a quadratic min- 
imizing function and linear equality .and inequality conditions arc a quadratic 
programming problem. This problem is used by difi'erent parts of the algorithm 
(see eqs. (27) and (31)), so Section 3.7 is devoted to this calculation. 



3.3.2 Removing restrictions 

The restrictions Z^^^i A^; = 1 are removed by reducing the number of rows by 
J (one for every group of indexes jT, . . . , jLj). Specifically, we use a linear map 
from to i?'^'"'^ represented by a block-diagonal matrix A with J blocks 
of size Lj X [Lj — 1): 




(23) 



Geometrically, such a map provides isometric rotation (A'^ = AX'^) to the 
hyperplane with zero first coordinate, i.e., (every block Aj defines a rotation 
of a unit simplex in Lj— dimensional space around a hypersurface opposite to 
the first vertex; the angle of this rotation is such that the first vertex moves 
to the point where the first coordinate equals 0). Explicitly, this rotation is 

Xji_^ = AjXji in matrix form or Xji^i = Xji - ^^^^ A^^ for / = 2,. . . ,Lj. 

New vectors A'^ do not possess any ties. It is easy to ascertain that such a 
transformation really conserves distances between points in a simplex. The 
reverse transformation is. 



1-E 



1=2 -^jl- 



+ 



L,-l 



(24) 
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3.3.3 Algorithm for identifying the subspace 

The initial completion of the moment matrix is constructed in a arbitrary way, 
6. g, by the unitary diagonal matrix or completing by frequencies as fij = fifj. 
The next preliminary step is the rotation of each simplex (corresponding to 
each question as described above) to the hyperplane to eliminate one degree 
of freedom. This produces n points c^, . . . , c" (images of columns of frequency 
matrix) in m = (|L| — J) dimensional space. The problem is to find an affine 
plane that minimally deviates from these points in the space of individual 
probabilities. First, we find the center of gravity of this system 

c'-IEc\ (25) 

and then consider a new set of points c* = c* — c°, that corresponds to shifting 
the point of origin. Now we need to find a X-dimensional linear subspace 
in i?™ that minimally deviates from this set of points. The solution of this 
problem is well-known (see, e.g., chapter 43 of Kendall and Stuart, 1977): one 
has to consider anmxm matrix X with componentsX^^ = J2i KK'^ ^^i^ matrix 
is symmetric and positively defined, and thus its normalized eigenvectors are 
composed of an orthonormal basis in i?™. Let 7i > 72 > ■ ■ ■ > 7m > be 
eigenvalues of matrix X, and let , . . . , z™- be corresponding eigenvectors. The 
plane of dimensionality K that minimizes the sum of squared distances from 
points c^, . . . , c"" is spanned by 2;^, .... 2;™, and the sum of squared distances 
is tr X — J2k=i Ik- Vectors , + , . . . , c'^ + z^~^ give us an affine basis of 
the sought affine plane. Finally, we apply inverses of transformation (24) to 
c°, + 2;^, . . . , + z^~^ to obtain the sought basis A^, . . . , of the subspace 
A. 

3.4 Choice of a basis 

The basis cannot be defined uniquely, and any convex combination keeping the 
LLS restrictions can be considered an alternative. A choice may be made using 
prior information about the process of interest. The appeal of prior information 
at this stage is reasonable because of the evident fact that the same dataset 
can be used for analyzing different (say, disability or CVD) processes. 

The way how this information is used and how the procedure of specific choice 
of the basis is defined is a question of taste. We describe here two possible 
schemes implemented in the algorithm prototype. 

A researcher specifies the characteristics of "ideal" individuals based on his/her 
experience in the research domain. Then he/she can construct vectors of prob- 
abilities for such ideal individuals or take these individuals from the sample 
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under consideration. The vectors of probabilities of these individuals are taken 
as basis vectors. If probability vectors are constructed by hand, they could be 
beyond polyhedron Pg, so they should be projected to Pg. The individual co- 
ordinates in this basis would represent "proximity" of the individual to the 
"ideal" ones. 

In another scheme, the basis is obtained using assignment of LLS scores (calcu- 
lated on some arbitrary basis) to K clusters, and then basis vectors A^, . . . , 
are calculated as means of probabihties over each cluster. 

A researcher can develop his/her own scheme of basis selection. For exam- 
ple, he/she can simply use vectors already known from previous studies or 
construct a basis purely mathematically, e.g., from the condition of maximal 
linear independence of the vectors, or choose it from the set of the supportive 
polyhedron vertexes. 



3. 5 Calculation of individual conditional expectations. 

When a basis of the supporting plane is found, the conditional expectations 
can be found from the main system of equations (12), which is a linear system 
after substituting the basis. The system, however, relates conditional expec- 
tations E(Gk\X = i) for a pattern i with at least one outcome. Thus exact 
system of equations (12) can be written for all patterns £ except patterns 
where all outcomes are known. For the complete patterns, we can calculate 
J conditional expectations, subsequently excluding one of J questions (i.e., 
obtaining patterns where i^^ denotes vector £ with j'*'^ coordinate equal 
to 0), solving the exact system of equations for obtained patterns, and defin- 
ing LLS score for complete pattern as mean over J solutions for conditional 
expectations for i^^ patterns. This approach can be formalized by considering 
a system of J system of equations: 

E\V5^fe^4 (26) 
k h 

This is a sparse overdetermined system; methods for solving such systems 
are well-elaborated (see, for example, Forsythe et al. (1977), Kahaner et al. 
(1988)). Since J2k9ek = 1 and J^k'^'ji ' 9ek ^ have to be satisfied for any i, 
the Lagrangian for the optimization task has to include corresponding terms 
with Lagrange multipliers: 

L9e = E(E>^'r9ek-4]] +12PjiE>^jI ' 9ek + P (E9ek - l] (27) 

j \ k Je / jl k \ k ) 
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This is a qiiadratic programming problem. We used the method of antigradicnt 
projection to the polyhedron defined by the inequality and equality conditions 
with iterations as described in Section 3.7. The initial point is defined as 
g£k = It is possible to show that this initial point always belongs to the 
polyhedron. 

LLS scores can deviate restrictions J2k ^%'9ik > in limited numbers of cases; 
e.g., for a relatively small number of questions J < 100. Options to solve the 
optimization task without the inequality restriction are also included in the 
program. Keeping in (28) Lagrange multipliers for equality conditions only we 
have ^ 

^^^ = E(EA?r^^.--^) +p(e^^^-i) (28) 

The advantage of this task is that it can be solved explicitly, i.e.; without the 
search procedures required for the quadratic programming problem. The sim- 
plest way to deal with the restriction is to just substitute the K*^ component 
and apply SVD to a J x {K - l)-matrix Ajk = X'j^. - \f^.. If SVD has exphcit 
form as Kjk = J2k' ^jk'Wk'Vk'k, where A is J x (K — l)-matrix, W contains a 
vector of singular values, and V is an orthogonal matrix, then the result for 
conditional expectations is 

S«=E^V(%„,-AS)^ (29) 
for /c = 1, . . . , - 1 and = 1 - Ef=Y 9ek- 

3.6 Mixing distribution 

The mixing distribution for an analyzed set of data is approximated by empiri- 
cal distribution, where an individual gives a unit contribution to the histogram 
of the distribution. A support of this distribution is a set of / points. Prob- 
abilities of the joint distribution (4) are estimated as the sum over sample 
individuals or to the sum over possible outcome patterns. 

Pi - n,,,.^o - E,, h' n,,,.^o E. 9ek\%. (30) 

3. 7 Quadratic programming problem. 

The quadratic programming problem is used by the algorithm in the comple- 
tion procedure, in calculation of individual LLS scores, and in basis construc- 
tion. Corresponding expressions for Lagrangians are given in eqs. (22), (27), 
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and (31). In all cases, the function minimized is quadratic, and equality and 
inequality conditions are linear functions of the sought parameters. The ma- 
trix of quadratic form is positive definite in all cases. The solution is based on 
Kuhn- Tucker conditions which become sufficient in these cases. The method 
of antigradient projection is used in searching for the optimal point (Attetkov 
et al., 2001). The antigradient direction is defined by the direction opposite 
to the gradient in a current point. Then the antigradient is projected to a 
subspace defined by so-called active conditions: all equality conditions and 
those inequality conditions which are strictly satisfied (i.e., the current point 
belongs to the corresponding plane). If the absolute value of the projection is 
not zero then the current point is moved in this direction up to the intersec- 
tion with the closest boundary, or to the minimum of the quadratic function, 
whichever is closer. If the absolute value of the projection is zero, then the 
sign of Lagrange multipliers corresponding to active inequality restrictions is 
checked. If they all are positive, the current point is the minimum; otherwise a 
restriction corresponding to the negative Lagrange multiplier is removed and 
a new antigradient projection is calculated. 

3.8 Clustering 

Clustering LLS scores is not a necessary component for finding the subspace 
or calculation of LLS scores; however, it is helpful in selecting a basis and 
in cross-checking in simulation studies. Two types of clustering procedures 
(Manton et al., 2003) are implemented in the algorithm. The first is the k- 
means algorithm for the situation where the number of clusters is fixed a 
priori. The second is a hierarchical procedure, which sequentially joints clusters 
with minimal distance between them. Several distance definitions are possible: 
distance between centers of mass in clusters, between closest and the most 
outlying cluster members. Numerical comparison shows that the most reliable 
results are obtained for the center-of-mass scheme. 



3.9 Missing data 

Missing data are often difficult problems in statistical analyses. Missing data 
are generated by the absence of responses for an individual to specific ques- 
tions. The properties of LLS analysis make this kind of missing data relatively 
easy to handle. Two main sources of missing answers could be considered: first, 
when the failure to answer the question is random; and second, when the fail- 
ure to answer the question correlates with answers to other questions. In the 
first case (missing data are random), the solution is provided by the fact that 
the input of the LLS algorithm consists of frequencies of partial response pat- 
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terns (like the frequency of giving response C to the 2^'^ question and response 
A to the question). With missing data, such frequencies can be calculated 
by relating the number of individuals with a particular response pattern to 
the number of individuals who gave answers to the questions covered by the 
response pattern (rather than to the total number of individuals). The only 
drawback of this method is decreased precision of frequency estimators. As 
LLS scores are expectations of latent variables conditional on the arbitrary 
part of the response pattern for an individual, the available part of the re- 
sponse pattern can be used to estimate the value of the latent variable. In the 
second case (missing data are not random), the absence of an answer can be 
considered an additional alternative for answering a multi-choice question; in 
this case, standard LLS analysis can be applied. 



4 Applications 

Below we i) perform a series of simulation experiments to test the predictive 
power of LLS model and its ability to reveal and to quantitatively reconstruct 
a hidden latent structure and ii) demonstrate how the model performs when 
applied to a real dataset considering as an example a demographic survey with 
57 questions about disability and health status in 5,161 individuals (Manton 
and Gu, 2001). 



4-1 Simulation studies 

We perform three types of simulation experiments which demonstrate the 
quality of the reconstruction of i) linear subspace, ii) LLS score distribution, 
and iii) clusters in LLS score space. At the first stage of each simulation study, 
the dimensionality of the latent structure (strictly, dimensionality of the linear 
space supporting the structure in the space of individual probabilities) is cho- 
sen. Then, assuming a specific structure of questions and numbers of outcomes 
to each question (i.e., set of indexesj£j), basis vectors A^, . . . , A^and mixing 
distribution G are constructed from some arguments or simulated randomly. 
Then using individual probabilities calculated as in Eq. (5), outcomes of the 
specified number of individuals are simulated. The database generated is used 
as an input for LLS analysis to extract the supporting linear if-dimensional 
subspace and to estimate the mixing distribution via estimation of individ- 
ual conditional expectations. The comparison of original and extracted linear 
subspace and mixing distribution provides information about the quality of 
LLS analysis in conditions defined by /, J, and Lj. 
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4.1.1 Linear subspace 



We started with investigation of the quahty of reconstruction of hnear sub- 
space. Choosing initial 2-, 3-, and 5-dimensional subspaces with 60, 120, and 
240 dichotomous questions and assuming uniform distribution of LLS scores, 
we simulated 1,430 and 14,300 sets of individual responses. Then we recon- 
structed a linear subspace and compared it with the initial linear subspace. For 
such comparisons, we needed a procedure for the quantitative comparison of 
closeness of multidimensional subspaces, i.e, a measure of a distance or angle 
between the linear subspaces. Such measure d is constructed on the basis of 
the theory of orthogonal projector operators (see Golub and Van Loan, 1989), 
by which it is possible to define a distance between linear subspaces. By using 
CS-decomposition, it was proved that the distance may be interpreted as the 
sine of the generalized angle between subspaces. Technically, d — ^1 — cr^in, 
where (Tmin is the minimal singular value obtained by SVD of X-dimensional 
matrix Q — P1P2, and Pi and P2 are \L\ x K- orthogonal projectors onto 
comparing subspaces. We will use 0? as a measure of the quality reconstruc- 
tion. 

Specifically, original vectors A^, . . . , are constructed as follows: i) vectors 
A^, . . . , A^ can only take values or 1, ii) X^^ — 1 for all j, iii) for k > 2 
X'ji = if j < J/{K - 1) and Xj^ = 1 otherwise. In all cases X'-2 = 1 - A^^. 
The distribution of G was chosen to the concentrated in a set of discrete points 
uniformly covering the X-dimensional simplex. Then a set of probabilities 
is calculated for all individuals and J outcomes per individual are simulated 
according to these probabilities. To get stable results we performed a sufficient 
number of experiments (on the order of 100) and obtained d presented in the 



Table 1. 
Table 1 

Distances between simulated and reconstructed subspaces 



K 




Ar= 1,430 






A^=14,300 




J=60 


J=120 


J=240 


J=60 


J=120 


J=240 


2 


0.023 


0.023 


0.022 


0.008 


0.007 


0.007 


3 


0.075 


0.072 


0.070 


0.023 


0.023 


0.023 


5 


0.222 


0.190 


0.176 


0.073 


0.059 


0.057 



4.1.2 LLS score distribution 

We designed the experiments to have statistical stability and to analyze a 
large number of questions (J = 1,500). The position of the supporting plane 
was chosen randomly. LLS scores are distributed over a union of intervals 
[0.10, 0.25] and [0.50, 0.75]. Specific values of LLS scores are chosen determin- 
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istically but not simulated as in Kovtun et al. (2005b): 5,000 from 0.10 to 0.25 
and 5,000 from 0.50 to 0.75. In total, a sample of 10,000 individuals was gen- 
erated. First, a mixing distribution was restored from the sample data using 
the original basis used for outcome simulation. Figure la demonstrates recon- 
structed and true distributions. The latter is shown as a sohd line. The figure 
demonstrates an excellent quality of restoration of the mixing distribution 
when for reconstruction of mixing distribution we used a simulated, but not 
reconstructed, basis. Using original vectors and in this comparison al- 
lows us to test the procedure of reconstruction of LLS scores separately. These 
results together with results of a separate test of quality of supporting sub- 
space reconstruction in Table 1 complete the test of the two main components 
of the LLS algorithm. 

However, it is still not clear how overall reconstruction is good, i.e. how un- 
certainties of the subspace reconstruction could impact the reconstruction of 
the mixing distribution. To perform this test we used reconstructed vectors A^ 
and A^ instead of original in the example. The results are given in Figure lb. 




700- 



500 



400- 



200- 



N=10,000 
J= 1.500 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 g\ 



% 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 



Fig. 1. Reconstructed mixing distribution using simulated (left) and reconstructed 
(right) basis. Lines show simulated mixing distribution 



In addition, to analyze the performance of the algorithm we conducted this 
experiment for different numbers of individuals / and variables J. The time 
spent on the algorithm almost does not depend on the size of the sample, as 
the most time-consuming part of the algorithm is SVD of moment matrix, 
which has to be performed 6 times (for rank estimation and for 5 iterations of 
supporting subspace finding). Size of moment matrix depends on the number 
of measurements J . In the table below we give the time spent by the algorithm 
for different values of J using a computer with an AthlonXP 2500 (2.5GHz) 
processor and 512MB DDR RAM. In all cases sample size is 8100. 
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Table 2 



Running time for computer experiments with different J 



J 


50 


100 


200 


300 


500 


750 


1000 


1500 


Time 


18s 


23s 


Im 49s 


4m 45s 


17m 40s 


59m 


2h 40m 


lOh 35m 



4.. 1.3 Clusters in LLS score space 



We checked the abiUty of the LLS analysis to identify clusters of LLS scores. 
We assumed the LLS score distribution was concentrated at five points with 
equal weights. Then setting K — and / = 1, 000 we simulated individual 
responses for J=100, 200 and 500. Then we applied a complete- linkage hierar- 
chical clustering procedure (Section 3.8) both to the reconstructed LLS scores 
and to the original vector of response to combine individuals in 5 classes. As 
a measure of quality, we used the fraction of individuals assigned to the cor- 
rect class. We performed 10 simulations to make the calculation statistically 
stable. The average number of misclassified individuals is given in Table 3. 

Table 3 

The average number of misclassified individuals for different J 



J 


100 


200 


500 


Original responses 


22.7% 


17.3% 


14.9% 


LLS scores 


4.8% 


0.2% 


0.0% 



This example demonstrates that the individual conditional expectations are 
better suited for the purpose of classification then the original vectors of re- 
sponses. 

4-2 Application to the NLTCS data 

The National Long Term Care Survey is a longitudinal survey designed to 
study changes in the health and functional status of older Americans (aged 
65-I-). It also tracks health expenditures, Medicare service use, and the avail- 
ability of personal, family, and community resources for caregiving. The survey 
began in 1982, with follow-up surveys done in 1982, 1984, 1989, 1994, and 1999. 
A sixth follow-up survey was conducted in 2004-2005. A detailed description 
of the NLTCS may be found at http://nltcs.cds.duke.edu/. 

We analyzed a sample of 5,161 individuals from the 1999 NLTCS, and se- 
lected 57 questions. 27 variables characterize disability level with respect to 
activities of daily living, instrumental activities of daily living, and physical 
impairment. 30 variables describe self-reports of chronic diseases. Details about 
these questions may be found in Manton and Gu (2001). Then we excluded 
370 individuals with at least one missing outcome. Thus, 4,791 remaining in- 
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dividuals are subject to 57 questions, 49 of which have 2 possible answers, and 
8 have 4 possible answers. In total, we have \L\ — 130. 

The first task is to define the dimensionality of the LLS problem, K. As 
mentioned above, K is the rank of the moment matrix, and thus it could be 
estimated as the rank of the frequency matrix. We used the SVD to estimate 
the rank of the frequency matrix. As the frequency matrix is incomplete, we 
decompose not the whole matrix, but its bottom left corner of size 64. The 
singular values obtained were compared to total statistical error found as a 
quadratic sum of cell errors. This ratio was 0.292. Individual cell standard er- 
rors were estimated using Wilson's approach (19). The first 10 singular values 
are given in Table 4. 

The computations show that the hypothesis K — A has to be accepted with 
confidence level 95%; that corresponds approximately to the double standard 
error interval (0.584). For comparison, we will also consider the case of K = 
3. Note that there is no significant gap between the 4th and 5th singular 
numbers. This suggests that the support of the distribution is an ellipsoid 
of full dimensionality which is thinner in higher dimensions, and choosing 
a particular value for K approximates this ellipsoid by a lower- dimensional 
ellipsoid obtained from the true one by collapsing a number of smaller axis. 

Table 4 



First 10 singular values of frequency matrix of NLTCS 





0-2 


0-3 


£74 


0-5 


0-6 




0-8 




0"lO 


39.112 


3.217 


1.464 


0.652 


0.363 


0.310 


0.243 


0.220 


0.198 


0.148 



When the dimensionahty of the LLS-problem is fixed, we can complete the 
moment matrix using the algorithm described in Section 3.3. The sub-matrix 
corresponding to the first four dichotomous variables is. 






094 





513 


051 


0.328 


0.011 


0.258 


0.012 


0.518 


0.014 





906 





487 


949 


0.672 


0.989 


0.742 


0.988 


0.482 


0.986 





264 





918 





196 


0.633 


0.128 


0.688 


0.051 


0.846 


0.153 





736 





082 





804 


0.367 


0.872 


0.312 


0.949 


0.154 


0.847 





335 





916 





275 


0.872 


0.142 


0.664 


0.164 


0.888 


0.230 





665 





084 





725 


0.128 


0.858 


0.336 


0.836 


0.112 


0.770 





160 





879 





085 


0.514 


0.034 


0.424 
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Completed values are marked in bold style. Renormalization of the frequency 
matrix is performed (see Section 3.1). Recall that columns of the frequency ma- 
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trix are approximately in the sought hnear subspace. The procedure described 
in Section 3.3 provides an unambiguous definition of the sought subspace and 
some basis vectors A^;. The basis cannot be defined uniquely, and any convex 
combination keeping LLS restrictions (3) can be considered an alternative. In 
this example, we apply a combination of the methods discussed in Section 
3.4. Specifically, in the beginning we estimated LLS scores in some prior basis. 
Since subsequent conclusions are made on the basis of clusterization and anal- 
ysis of probability vectors, which are independent of initial basis choice, the 
initial basis is taken as it was obtained from subspace construction, i.e., it is 
arbitrary. Then LLS scores are assigned to 7-8 clusters (i.e., for slightly larger 
numbers of clusters than K) using methods of cluster analysis. By analyzing 
outcomes of typical representative respondents (with LLS scores closest to 
centers of clusters) and of probabilities calculated as means over each clus- 
ter, we can identify the cluster structure of the analyzed population. Based on 
this analysis, we chose K — 4 clusters corresponding to i) healthy individuals 
(k = 1), ii) partly disabled individuals {k = 4), iii) strongly disabled individ- 
uals {k = 2), and iv) individuals with chronic diseases but without evidence 
of disability {k = 3). For each such group, we created a "typical" vector of 
probabilities by hand. Specifically, for the first group we chose unit proba- 
bihties for all answers corresponding to healthy states. For the second group, 
mean frequencies over the sample are chosen. For the third group, unit proba- 
bilities are assigned to answers corresponding to strongly disabled states and 
means over samples for chronic disease questions. For the fourth group, unit 
probabilities correspond to non-disabled states and to all disease diagnoses. 
The final basis is constructed by projection of all four vectors to the simplex. 
Technically, finding vectors gk minimizes the Lagrangian (27), which for this 
task has the form: 

Lgk' = j:(j:>^jr9k-Pj'] +Ep.'E\V^^ + pfE5^-iV (31) 

jl \ k J jl k \ k I 

and applying the algorithm sketched in Section 3.7. Then a similar analysis 
was performed for i^' = 3, where the basis component for partly disabled 
individuals was rejected. Figure 2 demonstrates the results. The plots on the 
left show 2d-polyhedron ioxK — 3 and two projections of 3d-polyhedron for 
K = 4. These polyhedrons are defined by LLS restrictions (3). In this case, 
LLS scores are restricted by 130 inequality and 1 equality. 2d-polyhedron is 
shown as it is in the plane normal to vector g = (1/3, 1/3, 1/3). 3d-polyhedron is in 
the plane normal to g = (1/4, 1/4, 1/4, 1/4). Basis vectors produced unit simplexes 
are labeled by numbers. Plots on the right demonstrate how the polyhedrons 
are filled by the considered population. For the filling we performed clustering 
to 1,000 clusters. Each point in the plots represents one cluster. The area 
of each point is proportional to the number of individuals assigned to the 
corresponding cluster. The exception is the point marked by open circles with 
a closed point inside. About half of the total population was assigned to this 
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Fig. 2. Polyhedrons defined by LLS constrains for K=3 (a) and i^=4 (c,e) and their 
fiUing by LLS scores of NLTCS individuals (b,d,f). See text for further explanations. 



cluster. We restrict consideration to this illustration. 
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5 Discussion and Conclusion 



LLS is a model describing high-dimensional categorical data assuming exis- 
tence of a latent structure described by X-dimensional random vectors. This 
vector is interpreted as explanatory variables which can shed light on mutual 
correlations observed in measured categorical variables. This vector plays the 
role of a random variable mixing independent distribution such that the ob- 
served joint distribution is maximally close to the data. Mathematically, LLS 
analysis considers the observed joint distribution of categorical variables as a 
mixture of individual joint distributions, which are assumed to be independent. 
This is the standard "local independence" assumption of latent structure anal- 
ysis; the specific LLS assumption is that the mixing distribution is supported 
by a X-dimensional subspace A of the space of all independent distributions 
or, equivalently, of the space of individual probabilities. The mixing distri- 
bution can be considered as a distribution of random vectors G, which take 
values in A. The vectors gi (LLS scores) are the hidden states of individuals 
in which we are interested. They can be estimated as conditional expectations 
of G, gi, = £{G\Xi = £i, . . . ,Xj = where i = . . . ,£j) is a response 
pattern. Support of this random vector is a X-dimensional space, the dimen- 
sion of which is defined by the dataset. Linear support is a distinctive feature 
of LLS compared to other latent structure analyses. For example, LCM is 
characterized by mixing distribution concentrated in several isolated points. 

Another important distinction is the existence of an algorithm capable of es- 
timating a LLS model for large numbers of questions and individuals. When 
a basis A^, . . . , \^ of linear subspace supported mixing distribution is known, 
conditional expectations gg can be calculated by solving a linear system of 
equations. A basis A^, . . . , , in turn, can be identified by applying principal 
component analysis to the moment matrix. As the choice of a basis is not 
unique, one has to apply substantial knowledge derived from the applied do- 
main to make the most appropriate choice. This algorithm, being a sequence 
of linear algebra methods, does not use maximum likelihood methods. This is 
an advantage of the method, because individual information is presented via 
nuisance parameters, which creates difficulties in the maximum likelihood ap- 
proach. The LLS algorithm for parameter estimation is based on two theorems 
which became possible because of a linear assumption about the support of 
the mixing distribution. The first identifies properties of the moment matrix. 
The second presents the main system of equations. Just the existence of the 
system of equations that describes parameters of the model is a significant 
advantage of the LLS method, as it allows us to avoid using maximum like- 
lihood estimators, which may not be consistent in the presence of nuisance 
parameters. 

Construction of LLS analysis on the basis of the theory of mixed distributions 
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provides clear and simple proof of basic statistical properties of the estimator. 
Regarding consistency, note that parameter estimations of the LLS model are 
solutions of a quasilinear system of equations, whose coefficients are the ob- 
served frequencies. When true moments are substituted in the system instead 
of frequencies, its solution represents the true values of parameters. The con- 
sistency of LLS estimators follows from the fact that i) solutions of the system 
continuously depend on its coefficients, and ii) frequencies are consistent esti- 
mators for the moments. Regarding identifiability, it was shown in Kovtun et 
al. (2005b) that the LLS model is identifiable if the moment matrix is "suffi- 
ciently non-degenerated" . It was also shown in Kovtun et al. (2005b) that for 
almost all mixing distributions the moment matrix is non-degenerated. Thus, 
LLS models are almost surely identifiable. 

We performed a series of simulation experiments to analyze the quality of 
reconstruction of: i) linear subspace; ii) LLS mixing distribution; and iii) clus- 
tering properties. The results demonstrated an acceptable quality of recon- 
struction. It is interesting to compare the results of the LLS model with the 
potential results of a latent class model. If mixing distribution is concentrated 
in several isolated points, LCM would restore the same picture as LLS, and 
thus, it could be an alternative for the LLS model. However, in cases like in 
Figure 2 LCM may only be used as an approximation, and it may be shown 
that LCM should involve approximately 1,000 latent classes (the number of 
classes in this case is approximately equal to number of different LLS scores 
obtained by the LLS model). 

Analysis of demographic data on disability and chronic diseases demonstrated 
the potential of the LLS model to investigate the properties of populations 
with latent heterogeneity. Performing cluster analysis, we identified four basic 
population groups corresponding to healthy, disabled, strongly disabled in- 
dividuals, and persons with chronic diseases. We found linear subspaces for 
K — 3 and K = 4. Empirical mixing distributions for these cases are pre- 
sented as distributions on supportive polyhedrons (Figure 2) defined by LLS 
restrictions. 

This analysis can be naturally generalized to the case of longitudinal data 
using a binomial quadratic hazard model (Manton et al, 1992) or stochastic 
process model (Yashin and Manton, 1997). This can provide the possibility of 
introducing mortality into the model. The basic advantages of these models 
are that they provide a biologically justified U- or J- shaped hazard function 
versus covariates (Witteman et al., 1994). In our case, LLS scores would play 
the role of covariates. LLS scores are constrained to have a sum of one. This 
is appropriate for interpretation and applications. Constraints of LLS scores 
(e.g., J2k9k = 1) prevent the direct use of a stochastic process model for pro- 
jection. One way to eliminate this restriction without losing the advantage in 
interpretation is to eliminate the component corresponding to the healthiest 
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pure type (due to the freedom in basis choice, a basis with healthy state can 
always be chosen), thereby solving the problem of dealing with this restriction 
and, moreover, allowing us to specify a model for the hazard function. For 
the hazard we will use a quadratic hazard model, where the quadratic form 
models dependence on the components of state vectors. Since the health com- 
ponent of the LLS score vector is eliminated, it is reasonable (and technically 
appropriate) to assume that the minimum of mortality is located in the origin 
over the remaining components (i.e., mortality is minimal for the healthiest 
people). This allows us to drop the linear component of the quadratic hazard 
function. After estimating the parameters, a scheme for the projection calcu- 
lation has to be developed. In the general case, possibly including nonlinear 
relations in dynamic equations, the approach can be based on a microsimu- 
lation procedure; i.e., a method for simulating trajectories for each person in 
the cohort (Akushevich et al., 2005). 

Furthermore, LLS opens its own possibilities for longitudinal analyses of re- 
peated measurements using its inherent features. Supporting planes obtained 
for different waves may not coincide with each other. The first question to be 
analyzed is how different are supportive planes obtained from different waves 
and whether a plane obtained for combined dataset is a good fit or there is 
a trend in the plane movement over a state space. In the first plane 
obtained for joint analysis would be taken for longitudinal analysis. Basis be- 
comes time independent and chosen by taking specifics of the problem into 
account. If a trend of time movement of supportive plane is found, we should 
analyze the possible explanation of the trend in terms of time trends in the 
population measured in different waves of such experiments. Basis in this case 
would first be calculated for the joined plane, and then projections of basis 
vectors of a specific plane provide a time dependence of the basis. In all cases 
we can investigate the difference between results obtained for time dependent 
and approximate time independent bases. 

This way a series of mathematical models to deal with the longitudinal analysis 
of biomedical data with repeated categorical measurements and event history 
data can be created. They will be oriented both for population and individual 
prognoses through analysis of individual trajectories in state space. For exam- 
ple, if the models are calibrated using NLTCS and Medicare data, the results 
could provide a better understanding of the biodemographic mechanisms re- 
sponsible for aging and morbidity, and could be useful in forecasting health 
and population trends and in stochastic investigation in medical economics, 
including estimating the financial effectiveness of new medical technologies. 

After such investigation of missing data, they can be filled by probabilities f3ji, 
which can be calculated using LLS scores of the known part of the outcome 
pattern and the found linear subspace A. This filling probability does not 
depend on basis choice and on prior information used for the basis selection 
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procedure. The largely model independent imputation procedure could be very 
useful in filling categorical data in historical cohorts or in dataset where data 
collection is difficult or costly. 

LLS can be used to analyze data where a high dimensional measurement vec- 
tor represents a hidden structure affecting the results of measurements. An 
abundance of such data recently appeared in genetic studies of biological sys- 
tems where the expression of thousands of genes in cells taken from different 
organs and tissues of humans or laboratory animals is measured. Such mea- 
surements are performed to find appropriate regularities in biological func- 
tioning of organs and tissues of respective organisms and to detect changes 
in hidden biological structure due to disease, exposure to external factors, ag- 
ing related changes, etc. Such an analysis will help us to better understand 
mechanisms of genetic regulation, by suggesting genes playing key roles in 
organizing response to changes in internal or external milieu. 

Forthcoming steps with LLS and the algorithms will include further investi- 
gation of algorithm properties and development of computer code in a form 
convenient for use by applied researchers. The most important property to 
be investigated is numerical stability of the algorithm. Different possibilities 
to perform subtasks should be investigated to avoid ill-posed problems. Sev- 
eral steps in algorithms are based on heuristic approaches, so they have to 
be theoretically investigated and improved. This will, on the one hand, pro- 
vide a basis for selection of numerical methods used in the algorithm and, on 
the other hand, provide rigorous proofs of conditions of applicability of the 
algorithm and reliability of its results. 
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Appendix 

The mixing distribution (which is a distribution in Euclidean space i?'^' carried 
by a polyhedron defined by (3)) can be given in form of either a probabilistic 
measure or probability density function Pj3{P) (or cumulative distribu- 

tion function F(/3)) with respect to the standard Lebesque measure in i?'^'). 
In the last case one must allow p^(/3) to be a generalized function to capture 
cases where /x^ has a singular component with respect to a Lebesque measure. 

The permitted domain (3) of density function p/3(/5) in (7) can be reflected 
explicitly in terms of products of S— and ^—functions, so the density function 
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has a representation 

pm = pw) (n,, (^{^3) n, ^ (E, Pol - 1) (32) 

Recall that the generalized p.d.f. 5{x) corresponds to a distribution concen- 
trated at a single point x — Q and is characterized by properties 5{x) — 
for X 7^ and j'!^^dx5{x) — 1; 9{x) is a characteristic function of positive 
half-axis, i.e. 9{x) = if a; < and d{x) = 1 if a; > 0. Product YljidYljS 
is outside the polyhedron defined by conditions (3), thus values of p(/9) 
outside Pjj are insignificant. 

In transferring from the integration over to the integration over the inte- 
gral dimension is reduced from \L\ with J restrictions defined by 5— functions 
to K with one restriction. Explicitly it can be obtained from (7) by i) or- 
thogonal transformation of integration variables such that K new variables 
belong to A; ii) using the fact (main LLS assumption) that the density func- 
tion depends only on these K variables; iii) integration over remaining \L\—K 
variables. Formally such a procedure corresponds to representation of p{(3), 

p{(5) = J dgp(g) n^.^ S - (33) 

and forthcoming integration over all Pji using \L\ delta functions in (33). S- and 
^-functions extracting permitted integration area (3) in integrals (7) and (32) 
are transformed such that permitted area in integration over g are given by (6). 
As a result we obtain unconditional moments (and, therefore probabilities 
Pi) as integrals over g coordinates: 



X (34) 



X 
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